diff --git a/docs/uri-scheme-browse-directory.rst b/docs/uri-scheme-browse-directory.rst index f8d63b53..2f20c040 100644 --- a/docs/uri-scheme-browse-directory.rst +++ b/docs/uri-scheme-browse-directory.rst @@ -1,31 +1,65 @@ Directory ^^^^^^^^^ -.. http:get:: /browse/directory/(sha1_git)/[(path)/] +.. http:get:: /browse/directory/(sha1_git)/ + + HTML view for browsing the content of a directory reachable from + the provided root one (including itself) identified by its **sha1_git** value. + + The content of the directory is first sorted in lexicographical order + and the sub-directories are displayed before the regular files. + + The view enables to navigate from the requested directory to + directories reachable from it in a recursive way but also + up to the root directory. + A breadcrumb located in the top part of the view allows + to keep track of the paths navigated so far. + + :param string sha1_git: hexadecimal representation for the **sha1_git** identifier + of the directory to browse + :query string path: optional parameter used to specify the path of a directory + reachable from the provided root one + :statuscode 200: no error + :statuscode 400: an invalid **sha1_git** value has been provided + :statuscode 404: requested directory can not be found in the archive + + **Examples:** + + .. parsed-literal:: + + :swh_web_browse:`directory/977fc4b98c0e85816348cebd3b12026407c368b6/` + :swh_web_browse:`directory/9650ed370c0330d2cd2b6fd1e9febf649ffe538d/?path=kernel/sched` + + +.. http:get:: /browse/directory/(sha1_git)/(path)/ + :deprecated: + + .. warning:: + That endpoint is deprecated, use :http:get:`/browse/directory/(sha1_git)/` instead. HTML view for browsing the content of a directory reachable from the provided root one (including itself) identified by its **sha1_git** value. The content of the directory is first sorted in lexicographical order and the sub-directories are displayed before the regular files. The view enables to navigate from the requested directory to directories reachable from it in a recursive way but also up to the root directory. A breadcrumb located in the top part of the view allows to keep track of the paths navigated so far. :param string sha1_git: hexadecimal representation for the **sha1_git** identifier of the directory to browse :param string path: optional parameter used to specify the path of a directory reachable from the provided root one :statuscode 200: no error :statuscode 400: an invalid **sha1_git** value has been provided :statuscode 404: requested directory can not be found in the archive **Examples:** .. parsed-literal:: :swh_web_browse:`directory/977fc4b98c0e85816348cebd3b12026407c368b6/` :swh_web_browse:`directory/9650ed370c0330d2cd2b6fd1e9febf649ffe538d/kernel/sched/` diff --git a/docs/uri-scheme-browse.rst b/docs/uri-scheme-browse.rst index 6f5b7668..87b637f6 100644 --- a/docs/uri-scheme-browse.rst +++ b/docs/uri-scheme-browse.rst @@ -1,93 +1,93 @@ URI scheme for swh-web Browse application ========================================= This web application aims to provide HTML views to easily navigate in the archive, thus it needs to be reached from a web browser. If you intend to query the archive programmatically through any HTTP client, please refer to the :ref:`swh-web-api-urls` section instead. Context-independent browsing ---------------------------- Context-independent URLs provide information about objects (e.g., revisions, directories, contents, person, ...), independently of the contexts where they have been found (e.g., specific repositories, branches, commits, ...). The following endpoints are the same of the API case (see below), and just render the corresponding information for user consumption. Where hyperlinks are created, they always point to other context-independent user URLs: * :http:get:`/browse/content/[(algo_hash):](hash)/`: Display a content * :http:get:`/browse/content/[(algo_hash):](hash)/raw/`: Get / Download content raw data - * :http:get:`/browse/directory/(sha1_git)/[(path)/]`: Browse the content of a directory + * :http:get:`/browse/directory/(sha1_git)/`: Browse the content of a directory * :http:get:`/browse/person/(person_id)/`: Information on a person * :http:get:`/browse/revision/(sha1_git)/`: Browse a revision * :http:get:`/browse/revision/(sha1_git)/log/`: Browse history log heading to a revision Context-dependent browsing -------------------------- Context-dependent URLs provide information about objects, limited to specific contexts where the objects have been found. For instance, instead of having to specify a (root) revision by **sha1_git**, users might want to specify a place and a time. In Software Heritage a "place" is an origin, with an optional branch name; a "time" is a timestamp at which some place has been observed by Software Heritage crawlers. Wherever a revision context is expected in a path (i.e., a **/browse/revision/(sha1_git)/** path fragment) we can put in its stead a path fragment of the form **/browse/origin/?origin_url=(origin_url)×tamp=(timestamp)&branch=(branch)**. Such a fragment is resolved, internally by the archive, to a revision **sha1_git** as follows: - if **timestamp** is not given as query parameter: look for the most recent crawl of origin identified by **origin_url** - if **timestamp** is given: look for the closest crawl of origin identified by **origin_url** from timestamp **timestamp** - if **branch** is given as a query parameter: look for the branch **branch** - if **branch** is absent: look for branch "HEAD" or "master" - return the revision **sha1_git** pointed by the chosen branch The already mentioned URLs for revision contexts can therefore be alternatively specified by users as: * :http:get:`/browse/origin/directory/` * :http:get:`/browse/origin/content/` * :http:get:`/browse/origin/log/` Typing: - **origin_url** corresponds to the URL the origin was crawled from, for instance https://github.com/(user)/(repo)/ - **branch** name is given as per the corresponding VCS (e.g., Git) as a query parameter to the requested URL. - **timestamp** is given in a format as liberal as possible, to uphold the principle of least surprise. At the very minimum it is possible to enter timestamps as: - Unix epoch timestamp (see for instance the output of `date +%s`) - ISO 8601 timestamps (see for instance the output of `date -I`, `date -Is`) - YYYY[MM[DD[HH[MM[SS]]]]] ad-hoc format - YYYY[-MM[-DD[ HH:[MM:[SS:]]]]] ad-hoc format swh-web Browse Urls ------------------- .. include:: uri-scheme-browse-content.rst .. include:: uri-scheme-browse-directory.rst .. include:: uri-scheme-browse-origin.rst .. include:: uri-scheme-browse-person.rst .. include:: uri-scheme-browse-release.rst .. include:: uri-scheme-browse-revision.rst .. include:: uri-scheme-browse-snapshot.rst diff --git a/swh/web/assets/src/bundles/webapp/xss-filtering.js b/swh/web/assets/src/bundles/webapp/xss-filtering.js index cf79d39d..72b47bf9 100644 --- a/swh/web/assets/src/bundles/webapp/xss-filtering.js +++ b/swh/web/assets/src/bundles/webapp/xss-filtering.js @@ -1,42 +1,42 @@ /** - * Copyright (C) 2019 The Software Heritage developers + * Copyright (C) 2019-2020 The Software Heritage developers * See the AUTHORS file at the top-level directory of this distribution * License: GNU Affero General Public License version 3, or any later version * See top-level LICENSE file for more information */ import DOMPurify from 'dompurify'; // we register a hook when performing XSS filtering in order to // possibly replace a relative image url with the one for getting // the image bytes from the archive content DOMPurify.addHook('uponSanitizeAttribute', function(node, data) { if (node.nodeName === 'IMG' && data.attrName === 'src') { // image url does not need any processing here if (data.attrValue.startsWith('data:image') || data.attrValue.startsWith('http:') || data.attrValue.startsWith('https:')) { return; } // get currently browsed swh object metadata let swhObjectMetadata = swh.webapp.getBrowsedSwhObjectMetadata(); // the swh object is provided without any useful context // to get the image checksums from the web api if (!swhObjectMetadata.hasOwnProperty('directory')) { return; } // used internal endpoint as image url to possibly get the image data // from the archive content - let url = Urls.browse_directory_resolve_content_path(swhObjectMetadata.directory, - data.attrValue); + let url = Urls.browse_directory_resolve_content_path(swhObjectMetadata.directory); + url += `?path=${data.attrValue}`; data.attrValue = url; } }); export function filterXSS(html) { return DOMPurify.sanitize(html, {SAFE_FOR_JQUERY: true}); } diff --git a/swh/web/browse/views/content.py b/swh/web/browse/views/content.py index f8f19c85..91c49e09 100644 --- a/swh/web/browse/views/content.py +++ b/swh/web/browse/views/content.py @@ -1,354 +1,355 @@ -# Copyright (C) 2017-2019 The Software Heritage developers +# Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import difflib import json from distutils.util import strtobool from django.http import HttpResponse from django.shortcuts import render from django.template.defaultfilters import filesizeformat import sentry_sdk from swh.model.hashutil import hash_to_hex from swh.web.browse.browseurls import browse_route from swh.web.browse.snapshot_context import get_snapshot_context from swh.web.browse.utils import ( request_content, prepare_content_for_display, content_display_max_size, get_swh_persistent_ids, gen_link, gen_directory_link, ) from swh.web.common import query, service, highlightjs from swh.web.common.exc import NotFoundExc, handle_view_exception from swh.web.common.utils import reverse, gen_path_info, swh_object_icons @browse_route( r"content/(?P[0-9a-z_:]*[0-9a-f]+.)/raw/", view_name="browse-content-raw", checksum_args=["query_string"], ) def content_raw(request, query_string): """Django view that produces a raw display of a content identified by its hash value. The url that points to it is :http:get:`/browse/content/[(algo_hash):](hash)/raw/` """ try: re_encode = bool(strtobool(request.GET.get("re_encode", "false"))) algo, checksum = query.parse_hash(query_string) checksum = hash_to_hex(checksum) content_data = request_content(query_string, max_size=None, re_encode=re_encode) except Exception as exc: return handle_view_exception(request, exc) filename = request.GET.get("filename", None) if not filename: filename = "%s_%s" % (algo, checksum) if ( content_data["mimetype"].startswith("text/") or content_data["mimetype"] == "inode/x-empty" ): response = HttpResponse(content_data["raw_data"], content_type="text/plain") response["Content-disposition"] = "filename=%s" % filename else: response = HttpResponse( content_data["raw_data"], content_type="application/octet-stream" ) response["Content-disposition"] = "attachment; filename=%s" % filename return response _auto_diff_size_limit = 20000 @browse_route( r"content/(?P.*)/diff/(?P.*)", view_name="diff-contents", ) def _contents_diff(request, from_query_string, to_query_string): """ Browse endpoint used to compute unified diffs between two contents. Diffs are generated only if the two contents are textual. By default, diffs whose size are greater than 20 kB will not be generated. To force the generation of large diffs, the 'force' boolean query parameter must be used. Args: request: input django http request from_query_string: a string of the form "[ALGO_HASH:]HASH" where optional ALGO_HASH can be either ``sha1``, ``sha1_git``, ``sha256``, or ``blake2s256`` (default to ``sha1``) and HASH the hexadecimal representation of the hash value identifying the first content to_query_string: same as above for identifying the second content Returns: A JSON object containing the unified diff. """ diff_data = {} content_from = None content_to = None content_from_size = 0 content_to_size = 0 content_from_lines = [] content_to_lines = [] force = request.GET.get("force", "false") path = request.GET.get("path", None) language = "nohighlight" force = bool(strtobool(force)) if from_query_string == to_query_string: diff_str = "File renamed without changes" else: try: text_diff = True if from_query_string: content_from = request_content(from_query_string, max_size=None) content_from_display_data = prepare_content_for_display( content_from["raw_data"], content_from["mimetype"], path ) language = content_from_display_data["language"] content_from_size = content_from["length"] if not ( content_from["mimetype"].startswith("text/") or content_from["mimetype"] == "inode/x-empty" ): text_diff = False if text_diff and to_query_string: content_to = request_content(to_query_string, max_size=None) content_to_display_data = prepare_content_for_display( content_to["raw_data"], content_to["mimetype"], path ) language = content_to_display_data["language"] content_to_size = content_to["length"] if not ( content_to["mimetype"].startswith("text/") or content_to["mimetype"] == "inode/x-empty" ): text_diff = False diff_size = abs(content_to_size - content_from_size) if not text_diff: diff_str = "Diffs are not generated for non textual content" language = "nohighlight" elif not force and diff_size > _auto_diff_size_limit: diff_str = "Large diffs are not automatically computed" language = "nohighlight" else: if content_from: content_from_lines = ( content_from["raw_data"].decode("utf-8").splitlines(True) ) if content_from_lines and content_from_lines[-1][-1] != "\n": content_from_lines[-1] += "[swh-no-nl-marker]\n" if content_to: content_to_lines = ( content_to["raw_data"].decode("utf-8").splitlines(True) ) if content_to_lines and content_to_lines[-1][-1] != "\n": content_to_lines[-1] += "[swh-no-nl-marker]\n" diff_lines = difflib.unified_diff(content_from_lines, content_to_lines) diff_str = "".join(list(diff_lines)[2:]) except Exception as exc: sentry_sdk.capture_exception(exc) diff_str = str(exc) diff_data["diff_str"] = diff_str diff_data["language"] = language diff_data_json = json.dumps(diff_data, separators=(",", ": ")) return HttpResponse(diff_data_json, content_type="application/json") @browse_route( r"content/(?P[0-9a-z_:]*[0-9a-f]+.)/", view_name="browse-content", checksum_args=["query_string"], ) def content_display(request, query_string): """Django view that produces an HTML display of a content identified by its hash value. The url that points to it is :http:get:`/browse/content/[(algo_hash):](hash)/` """ try: algo, checksum = query.parse_hash(query_string) checksum = hash_to_hex(checksum) content_data = request_content(query_string, raise_if_unavailable=False) origin_url = request.GET.get("origin_url", None) selected_language = request.GET.get("language", None) if not origin_url: origin_url = request.GET.get("origin", None) snapshot_context = None if origin_url: try: snapshot_context = get_snapshot_context(origin_url=origin_url) except NotFoundExc: raw_cnt_url = reverse( "browse-content", url_args={"query_string": query_string} ) error_message = ( "The Software Heritage archive has a content " "with the hash you provided but the origin " "mentioned in your request appears broken: %s. " "Please check the URL and try again.\n\n" "Nevertheless, you can still browse the content " "without origin information: %s" % (gen_link(origin_url), gen_link(raw_cnt_url)) ) raise NotFoundExc(error_message) if snapshot_context: snapshot_context["visit_info"] = None except Exception as exc: return handle_view_exception(request, exc) path = request.GET.get("path", None) content = None language = None mimetype = None if content_data["raw_data"] is not None: content_display_data = prepare_content_for_display( content_data["raw_data"], content_data["mimetype"], path ) content = content_display_data["content_data"] language = content_display_data["language"] mimetype = content_display_data["mimetype"] # Override language with user-selected language if selected_language is not None: language = selected_language available_languages = None if mimetype and "text/" in mimetype: available_languages = highlightjs.get_supported_languages() root_dir = None filename = None path_info = None directory_id = None directory_url = None query_params = {"origin_url": origin_url} breadcrumbs = [] if path: split_path = path.split("/") root_dir = split_path[0] filename = split_path[-1] if root_dir != path: path = path.replace(root_dir + "/", "") path = path[: -len(filename)] path_info = gen_path_info(path) dir_url = reverse( "browse-directory", url_args={"sha1_git": root_dir}, query_params=query_params, ) breadcrumbs.append({"name": root_dir[:7], "url": dir_url}) for pi in path_info: + query_params["path"] = pi["path"] dir_url = reverse( "browse-directory", - url_args={"sha1_git": root_dir, "path": pi["path"]}, + url_args={"sha1_git": root_dir}, query_params=query_params, ) breadcrumbs.append({"name": pi["name"], "url": dir_url}) breadcrumbs.append({"name": filename, "url": None}) if path and root_dir != path: try: dir_info = service.lookup_directory_with_path(root_dir, path) directory_id = dir_info["target"] except Exception as exc: return handle_view_exception(request, exc) elif root_dir != path: directory_id = root_dir if directory_id: directory_url = gen_directory_link(directory_id) query_params = {"filename": filename} content_raw_url = reverse( "browse-content-raw", url_args={"query_string": query_string}, query_params=query_params, ) content_metadata = { "sha1": content_data["checksums"]["sha1"], "sha1_git": content_data["checksums"]["sha1_git"], "sha256": content_data["checksums"]["sha256"], "blake2s256": content_data["checksums"]["blake2s256"], "mimetype": content_data["mimetype"], "encoding": content_data["encoding"], "size": filesizeformat(content_data["length"]), "language": content_data["language"], "licenses": content_data["licenses"], "filename": filename, "directory": directory_id, "context-independent directory": directory_url, } if filename: content_metadata["filename"] = filename sha1_git = content_data["checksums"]["sha1_git"] swh_ids = get_swh_persistent_ids([{"type": "content", "id": sha1_git}]) heading = "Content - %s" % sha1_git if breadcrumbs: content_path = "/".join([bc["name"] for bc in breadcrumbs]) heading += " - %s" % content_path return render( request, "browse/content.html", { "heading": heading, "swh_object_id": swh_ids[0]["swh_id"], "swh_object_name": "Content", "swh_object_metadata": content_metadata, "content": content, "content_size": content_data["length"], "max_content_size": content_display_max_size, "mimetype": mimetype, "language": language, "available_languages": available_languages, "breadcrumbs": breadcrumbs, "top_right_link": { "url": content_raw_url, "icon": swh_object_icons["content"], "text": "Raw File", }, "snapshot_context": snapshot_context, "vault_cooking": None, "show_actions_menu": True, "swh_ids": swh_ids, "error_code": content_data["error_code"], "error_message": content_data["error_message"], "error_description": content_data["error_description"], }, status=content_data["error_code"], ) diff --git a/swh/web/browse/views/directory.py b/swh/web/browse/views/directory.py index 39161c5c..04aadf46 100644 --- a/swh/web/browse/views/directory.py +++ b/swh/web/browse/views/directory.py @@ -1,211 +1,229 @@ # Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import os from django.http import HttpResponse from django.shortcuts import render, redirect from django.template.defaultfilters import filesizeformat import sentry_sdk from swh.web.browse.browseurls import browse_route from swh.web.browse.snapshot_context import get_snapshot_context from swh.web.browse.utils import ( get_directory_entries, get_readme_to_display, get_swh_persistent_ids, gen_link, ) from swh.web.common import service from swh.web.common.exc import handle_view_exception, NotFoundExc from swh.web.common.utils import reverse, gen_path_info -@browse_route( - r"directory/(?P[0-9a-f]+)/", - r"directory/(?P[0-9a-f]+)/(?P.+)/", - view_name="browse-directory", - checksum_args=["sha1_git"], -) -def directory_browse(request, sha1_git, path=None): - """Django view for browsing the content of a directory identified - by its sha1_git value. - - The url that points to it is - :http:get:`/browse/directory/(sha1_git)/[(path)/]` - """ +def _directory_browse(request, sha1_git, path=None): root_sha1_git = sha1_git try: if path: dir_info = service.lookup_directory_with_path(sha1_git, path) sha1_git = dir_info["target"] dirs, files = get_directory_entries(sha1_git) origin_url = request.GET.get("origin_url", None) if not origin_url: origin_url = request.GET.get("origin", None) snapshot_context = None if origin_url: try: snapshot_context = get_snapshot_context(origin_url=origin_url) except NotFoundExc: raw_dir_url = reverse( "browse-directory", url_args={"sha1_git": sha1_git} ) error_message = ( "The Software Heritage archive has a directory " "with the hash you provided but the origin " "mentioned in your request appears broken: %s. " "Please check the URL and try again.\n\n" "Nevertheless, you can still browse the directory " "without origin information: %s" % (gen_link(origin_url), gen_link(raw_dir_url)) ) raise NotFoundExc(error_message) if snapshot_context: snapshot_context["visit_info"] = None except Exception as exc: return handle_view_exception(request, exc) path_info = gen_path_info(path) query_params = {"origin_url": origin_url} breadcrumbs = [] breadcrumbs.append( { "name": root_sha1_git[:7], "url": reverse( "browse-directory", url_args={"sha1_git": root_sha1_git}, query_params=query_params, ), } ) for pi in path_info: breadcrumbs.append( { "name": pi["name"], "url": reverse( "browse-directory", - url_args={"sha1_git": root_sha1_git, "path": pi["path"]}, - query_params=query_params, + url_args={"sha1_git": root_sha1_git}, + query_params={"path": pi["path"], **query_params}, ), } ) path = "" if path is None else (path + "/") for d in dirs: if d["type"] == "rev": d["url"] = reverse( "browse-revision", url_args={"sha1_git": d["target"]}, query_params=query_params, ) else: d["url"] = reverse( "browse-directory", - url_args={"sha1_git": root_sha1_git, "path": path + d["name"]}, - query_params=query_params, + url_args={"sha1_git": root_sha1_git}, + query_params={"path": path + d["name"], **query_params}, ) sum_file_sizes = 0 readmes = {} for f in files: query_string = "sha1_git:" + f["target"] f["url"] = reverse( "browse-content", url_args={"query_string": query_string}, query_params={ "path": root_sha1_git + "/" + path + f["name"], "origin_url": origin_url, }, ) if f["length"] is not None: sum_file_sizes += f["length"] f["length"] = filesizeformat(f["length"]) if f["name"].lower().startswith("readme"): readmes[f["name"]] = f["checksums"]["sha1"] readme_name, readme_url, readme_html = get_readme_to_display(readmes) sum_file_sizes = filesizeformat(sum_file_sizes) dir_metadata = { "directory": sha1_git, "number of regular files": len(files), "number of subdirectories": len(dirs), "sum of regular file sizes": sum_file_sizes, } vault_cooking = { "directory_context": True, "directory_id": sha1_git, "revision_context": False, "revision_id": None, } swh_objects = [{"type": "directory", "id": sha1_git}] swh_ids = get_swh_persistent_ids( swh_objects=swh_objects, snapshot_context=snapshot_context ) heading = "Directory - %s" % sha1_git if breadcrumbs: dir_path = "/".join([bc["name"] for bc in breadcrumbs]) + "/" heading += " - %s" % dir_path return render( request, "browse/directory.html", { "heading": heading, "swh_object_id": swh_ids[0]["swh_id"], "swh_object_name": "Directory", "swh_object_metadata": dir_metadata, "dirs": dirs, "files": files, "breadcrumbs": breadcrumbs, "top_right_link": None, "readme_name": readme_name, "readme_url": readme_url, "readme_html": readme_html, "snapshot_context": snapshot_context, "vault_cooking": vault_cooking, "show_actions_menu": True, "swh_ids": swh_ids, }, ) @browse_route( - r"directory/resolve/content-path/(?P[0-9a-f]+)/(?P.+)/", + r"directory/(?P[0-9a-f]+)/", + view_name="browse-directory", + checksum_args=["sha1_git"], +) +def directory_browse(request, sha1_git): + """Django view for browsing the content of a directory identified + by its sha1_git value. + + The url that points to it is + :http:get:`/browse/directory/(sha1_git)/` + """ + return _directory_browse(request, sha1_git, request.GET.get("path")) + + +@browse_route( + r"directory/(?P[0-9a-f]+)/(?P.+)/", + view_name="browse-directory-legacy", + checksum_args=["sha1_git"], +) +def directory_browse_legacy(request, sha1_git, path): + """Django view for browsing the content of a directory identified + by its sha1_git value. + + The url that points to it is + :http:get:`/browse/directory/(sha1_git)/(path)/` + """ + return _directory_browse(request, sha1_git, path) + + +@browse_route( + r"directory/resolve/content-path/(?P[0-9a-f]+)/", view_name="browse-directory-resolve-content-path", checksum_args=["sha1_git"], ) -def _directory_resolve_content_path(request, sha1_git, path): +def _directory_resolve_content_path(request, sha1_git): """ Internal endpoint redirecting to data url for a specific file path relative to a root directory. """ try: - path = os.path.normpath(path) + path = os.path.normpath(request.GET.get("path")) if not path.startswith("../"): dir_info = service.lookup_directory_with_path(sha1_git, path) if dir_info["type"] == "file": sha1 = dir_info["checksums"]["sha1"] data_url = reverse( "browse-content-raw", url_args={"query_string": sha1} ) return redirect(data_url) except Exception as exc: sentry_sdk.capture_exception(exc) return HttpResponse(status=404) diff --git a/swh/web/browse/views/revision.py b/swh/web/browse/views/revision.py index e989a195..05876fc2 100644 --- a/swh/web/browse/views/revision.py +++ b/swh/web/browse/views/revision.py @@ -1,590 +1,589 @@ # Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import hashlib import json import textwrap from django.http import HttpResponse from django.shortcuts import render from django.template.defaultfilters import filesizeformat from django.utils.html import escape from django.utils.safestring import mark_safe from swh.model.identifiers import persistent_identifier from swh.web.browse.browseurls import browse_route from swh.web.browse.snapshot_context import get_snapshot_context from swh.web.browse.utils import ( gen_link, gen_revision_link, gen_revision_url, get_revision_log_url, get_directory_entries, gen_directory_link, request_content, prepare_content_for_display, content_display_max_size, gen_snapshot_link, get_readme_to_display, get_swh_persistent_ids, format_log_entries, gen_person_mail_link, ) from swh.web.common import service from swh.web.common.exc import NotFoundExc, handle_view_exception from swh.web.common.utils import ( reverse, format_utc_iso_date, gen_path_info, swh_object_icons, ) def _gen_content_url(revision, query_string, path, snapshot_context): if snapshot_context: query_params = snapshot_context["query_params"] query_params["path"] = path query_params["revision"] = revision["id"] content_url = reverse("browse-origin-content", query_params=query_params) else: content_path = "%s/%s" % (revision["directory"], path) content_url = reverse( "browse-content", url_args={"query_string": query_string}, query_params={"path": content_path}, ) return content_url def _gen_diff_link(idx, diff_anchor, link_text): if idx < _max_displayed_file_diffs: return gen_link(diff_anchor, link_text) else: return link_text # TODO: put in conf _max_displayed_file_diffs = 1000 def _gen_revision_changes_list(revision, changes, snapshot_context): """ Returns a HTML string describing the file changes introduced in a revision. As this string will be displayed in the browse revision view, links to adequate file diffs are also generated. Args: revision (str): hexadecimal representation of a revision identifier changes (list): list of file changes in the revision snapshot_context (dict): optional origin context used to reverse the content urls Returns: A string to insert in a revision HTML view. """ changes_msg = [] for i, change in enumerate(changes): hasher = hashlib.sha1() from_query_string = "" to_query_string = "" diff_id = "diff-" if change["from"]: from_query_string = "sha1_git:" + change["from"]["target"] diff_id += change["from"]["target"] + "-" + change["from_path"] diff_id += "-" if change["to"]: to_query_string = "sha1_git:" + change["to"]["target"] diff_id += change["to"]["target"] + change["to_path"] change["path"] = change["to_path"] or change["from_path"] url_args = { "from_query_string": from_query_string, "to_query_string": to_query_string, } query_params = {"path": change["path"]} change["diff_url"] = reverse( "diff-contents", url_args=url_args, query_params=query_params ) hasher.update(diff_id.encode("utf-8")) diff_id = hasher.hexdigest() change["id"] = diff_id panel_diff_link = "#panel_" + diff_id if change["type"] == "modify": change["content_url"] = _gen_content_url( revision, to_query_string, change["to_path"], snapshot_context ) changes_msg.append( "modified: %s" % _gen_diff_link(i, panel_diff_link, change["to_path"]) ) elif change["type"] == "insert": change["content_url"] = _gen_content_url( revision, to_query_string, change["to_path"], snapshot_context ) changes_msg.append( "new file: %s" % _gen_diff_link(i, panel_diff_link, change["to_path"]) ) elif change["type"] == "delete": parent = service.lookup_revision(revision["parents"][0]) change["content_url"] = _gen_content_url( parent, from_query_string, change["from_path"], snapshot_context ) changes_msg.append( "deleted: %s" % _gen_diff_link(i, panel_diff_link, change["from_path"]) ) elif change["type"] == "rename": change["content_url"] = _gen_content_url( revision, to_query_string, change["to_path"], snapshot_context ) link_text = change["from_path"] + " → " + change["to_path"] changes_msg.append( "renamed: %s" % _gen_diff_link(i, panel_diff_link, link_text) ) if not changes: changes_msg.append("No changes") return mark_safe("\n".join(changes_msg)) @browse_route( r"revision/(?P[0-9a-f]+)/diff/", view_name="diff-revision", checksum_args=["sha1_git"], ) def _revision_diff(request, sha1_git): """ Browse internal endpoint to compute revision diff """ try: revision = service.lookup_revision(sha1_git) snapshot_context = None origin_url = request.GET.get("origin_url", None) if not origin_url: origin_url = request.GET.get("origin", None) timestamp = request.GET.get("timestamp", None) visit_id = request.GET.get("visit_id", None) if origin_url: snapshot_context = get_snapshot_context( origin_url=origin_url, timestamp=timestamp, visit_id=visit_id ) except Exception as exc: return handle_view_exception(request, exc) changes = service.diff_revision(sha1_git) changes_msg = _gen_revision_changes_list(revision, changes, snapshot_context) diff_data = { "total_nb_changes": len(changes), "changes": changes[:_max_displayed_file_diffs], "changes_msg": changes_msg, } diff_data_json = json.dumps(diff_data, separators=(",", ": ")) return HttpResponse(diff_data_json, content_type="application/json") NB_LOG_ENTRIES = 100 @browse_route( r"revision/(?P[0-9a-f]+)/log/", view_name="browse-revision-log", checksum_args=["sha1_git"], ) def revision_log_browse(request, sha1_git): """ Django view that produces an HTML display of the history log for a revision identified by its id. The url that points to it is :http:get:`/browse/revision/(sha1_git)/log/` """ try: per_page = int(request.GET.get("per_page", NB_LOG_ENTRIES)) offset = int(request.GET.get("offset", 0)) revs_ordering = request.GET.get("revs_ordering", "committer_date") session_key = "rev_%s_log_ordering_%s" % (sha1_git, revs_ordering) rev_log_session = request.session.get(session_key, None) rev_log = [] revs_walker_state = None if rev_log_session: rev_log = rev_log_session["rev_log"] revs_walker_state = rev_log_session["revs_walker_state"] if len(rev_log) < offset + per_page: revs_walker = service.get_revisions_walker( revs_ordering, sha1_git, max_revs=offset + per_page + 1, state=revs_walker_state, ) rev_log += [rev["id"] for rev in revs_walker] revs_walker_state = revs_walker.export_state() revs = rev_log[offset : offset + per_page] revision_log = service.lookup_revision_multiple(revs) request.session[session_key] = { "rev_log": rev_log, "revs_walker_state": revs_walker_state, } except Exception as exc: return handle_view_exception(request, exc) revs_ordering = request.GET.get("revs_ordering", "") prev_log_url = None if len(rev_log) > offset + per_page: prev_log_url = reverse( "browse-revision-log", url_args={"sha1_git": sha1_git}, query_params={ "per_page": per_page, "offset": offset + per_page, "revs_ordering": revs_ordering, }, ) next_log_url = None if offset != 0: next_log_url = reverse( "browse-revision-log", url_args={"sha1_git": sha1_git}, query_params={ "per_page": per_page, "offset": offset - per_page, "revs_ordering": revs_ordering, }, ) revision_log_data = format_log_entries(revision_log, per_page) swh_rev_id = persistent_identifier("revision", sha1_git) return render( request, "browse/revision-log.html", { "heading": "Revision history", "swh_object_id": swh_rev_id, "swh_object_name": "Revisions history", "swh_object_metadata": None, "revision_log": revision_log_data, "revs_ordering": revs_ordering, "next_log_url": next_log_url, "prev_log_url": prev_log_url, "breadcrumbs": None, "top_right_link": None, "snapshot_context": None, "vault_cooking": None, "show_actions_menu": True, "swh_ids": None, }, ) @browse_route( r"revision/(?P[0-9a-f]+)/", - r"revision/(?P[0-9a-f]+)/(?P.+)/", view_name="browse-revision", checksum_args=["sha1_git"], ) -def revision_browse(request, sha1_git, extra_path=None): +def revision_browse(request, sha1_git): """ Django view that produces an HTML display of a revision identified by its id. The url that points to it is :http:get:`/browse/revision/(sha1_git)/`. """ try: revision = service.lookup_revision(sha1_git) origin_info = None snapshot_context = None origin_url = request.GET.get("origin_url", None) if not origin_url: origin_url = request.GET.get("origin", None) timestamp = request.GET.get("timestamp", None) visit_id = request.GET.get("visit_id", None) snapshot_id = request.GET.get("snapshot_id", None) path = request.GET.get("path", None) dir_id = None dirs, files = None, None content_data = None if origin_url: try: snapshot_context = get_snapshot_context( origin_url=origin_url, timestamp=timestamp, visit_id=visit_id ) except NotFoundExc: raw_rev_url = reverse( "browse-revision", url_args={"sha1_git": sha1_git} ) error_message = ( "The Software Heritage archive has a revision " "with the hash you provided but the origin " "mentioned in your request appears broken: %s. " "Please check the URL and try again.\n\n" "Nevertheless, you can still browse the revision " "without origin information: %s" % (gen_link(origin_url), gen_link(raw_rev_url)) ) raise NotFoundExc(error_message) origin_info = snapshot_context["origin_info"] snapshot_id = snapshot_context["snapshot_id"] elif snapshot_id: snapshot_context = get_snapshot_context(snapshot_id) if path: file_info = service.lookup_directory_with_path(revision["directory"], path) if file_info["type"] == "dir": dir_id = file_info["target"] else: query_string = "sha1_git:" + file_info["target"] content_data = request_content(query_string, raise_if_unavailable=False) else: dir_id = revision["directory"] if dir_id: path = "" if path is None else (path + "/") dirs, files = get_directory_entries(dir_id) except Exception as exc: return handle_view_exception(request, exc) revision_data = {} revision_data["author"] = "None" if revision["author"]: author_link = gen_person_mail_link(revision["author"]) revision_data["author"] = author_link revision_data["committer"] = "None" if revision["committer"]: committer_link = gen_person_mail_link(revision["committer"]) revision_data["committer"] = committer_link revision_data["committer date"] = format_utc_iso_date(revision["committer_date"]) revision_data["date"] = format_utc_iso_date(revision["date"]) revision_data["directory"] = revision["directory"] if snapshot_context: revision_data["snapshot"] = snapshot_id browse_snapshot_link = gen_snapshot_link(snapshot_id) revision_data["context-independent snapshot"] = browse_snapshot_link revision_data["context-independent directory"] = gen_directory_link( revision["directory"] ) revision_data["revision"] = sha1_git revision_data["merge"] = revision["merge"] revision_data["metadata"] = escape( json.dumps( revision["metadata"], sort_keys=True, indent=4, separators=(",", ": ") ) ) if origin_info: revision_data["origin url"] = gen_link(origin_info["url"], origin_info["url"]) revision_data["context-independent revision"] = gen_revision_link(sha1_git) parents = "" for p in revision["parents"]: parent_link = gen_revision_link( p, link_text=None, link_attrs=None, snapshot_context=snapshot_context ) parents += parent_link + "
" revision_data["parents"] = mark_safe(parents) revision_data["synthetic"] = revision["synthetic"] revision_data["type"] = revision["type"] message_lines = ["None"] if revision["message"]: message_lines = revision["message"].split("\n") parents = [] for p in revision["parents"]: parent_url = gen_revision_url(p, snapshot_context) parents.append({"id": p, "url": parent_url}) path_info = gen_path_info(path) query_params = { "snapshot_id": snapshot_id, "origin_url": origin_url, "timestamp": timestamp, "visit_id": visit_id, } breadcrumbs = [] breadcrumbs.append( { "name": revision["directory"][:7], "url": reverse( "browse-revision", url_args={"sha1_git": sha1_git}, query_params=query_params, ), } ) for pi in path_info: query_params["path"] = pi["path"] breadcrumbs.append( { "name": pi["name"], "url": reverse( "browse-revision", url_args={"sha1_git": sha1_git}, query_params=query_params, ), } ) vault_cooking = { "directory_context": False, "directory_id": None, "revision_context": True, "revision_id": sha1_git, } swh_objects = [{"type": "revision", "id": sha1_git}] content = None content_size = None mimetype = None language = None readme_name = None readme_url = None readme_html = None readmes = {} error_code = 200 error_message = "" error_description = "" if content_data: breadcrumbs[-1]["url"] = None content_size = content_data["length"] mimetype = content_data["mimetype"] if content_data["raw_data"]: content_display_data = prepare_content_for_display( content_data["raw_data"], content_data["mimetype"], path ) content = content_display_data["content_data"] language = content_display_data["language"] mimetype = content_display_data["mimetype"] query_params = {} if path: filename = path_info[-1]["name"] query_params["filename"] = path_info[-1]["name"] revision_data["filename"] = filename top_right_link = { "url": reverse( "browse-content-raw", url_args={"query_string": query_string}, query_params=query_params, ), "icon": swh_object_icons["content"], "text": "Raw File", } swh_objects.append({"type": "content", "id": file_info["target"]}) error_code = content_data["error_code"] error_message = content_data["error_message"] error_description = content_data["error_description"] else: for d in dirs: if d["type"] == "rev": d["url"] = reverse( "browse-revision", url_args={"sha1_git": d["target"]} ) else: query_params["path"] = path + d["name"] d["url"] = reverse( "browse-revision", url_args={"sha1_git": sha1_git}, query_params=query_params, ) for f in files: query_params["path"] = path + f["name"] f["url"] = reverse( "browse-revision", url_args={"sha1_git": sha1_git}, query_params=query_params, ) if f["length"] is not None: f["length"] = filesizeformat(f["length"]) if f["name"].lower().startswith("readme"): readmes[f["name"]] = f["checksums"]["sha1"] readme_name, readme_url, readme_html = get_readme_to_display(readmes) top_right_link = { "url": get_revision_log_url(sha1_git, snapshot_context), "icon": swh_object_icons["revisions history"], "text": "History", } vault_cooking["directory_context"] = True vault_cooking["directory_id"] = dir_id swh_objects.append({"type": "directory", "id": dir_id}) diff_revision_url = reverse( "diff-revision", url_args={"sha1_git": sha1_git}, query_params={ "origin_url": origin_url, "timestamp": timestamp, "visit_id": visit_id, }, ) if snapshot_id: swh_objects.append({"type": "snapshot", "id": snapshot_id}) swh_ids = get_swh_persistent_ids(swh_objects, snapshot_context) heading = "Revision - %s - %s" % ( sha1_git[:7], textwrap.shorten(message_lines[0], width=70), ) if snapshot_context: context_found = "snapshot: %s" % snapshot_context["snapshot_id"] if origin_info: context_found = "origin: %s" % origin_info["url"] heading += " - %s" % context_found return render( request, "browse/revision.html", { "heading": heading, "swh_object_id": swh_ids[0]["swh_id"], "swh_object_name": "Revision", "swh_object_metadata": revision_data, "message_header": message_lines[0], "message_body": "\n".join(message_lines[1:]), "parents": parents, "snapshot_context": snapshot_context, "dirs": dirs, "files": files, "content": content, "content_size": content_size, "max_content_size": content_display_max_size, "mimetype": mimetype, "language": language, "readme_name": readme_name, "readme_url": readme_url, "readme_html": readme_html, "breadcrumbs": breadcrumbs, "top_right_link": top_right_link, "vault_cooking": vault_cooking, "diff_revision_url": diff_revision_url, "show_actions_menu": True, "swh_ids": swh_ids, "error_code": error_code, "error_message": error_message, "error_description": error_description, }, status=error_code, ) diff --git a/swh/web/tests/browse/views/test_content.py b/swh/web/tests/browse/views/test_content.py index b96130dc..69e95d0a 100644 --- a/swh/web/tests/browse/views/test_content.py +++ b/swh/web/tests/browse/views/test_content.py @@ -1,388 +1,390 @@ # Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from django.utils.html import escape from hypothesis import given from swh.web.browse.utils import ( get_mimetype_and_encoding_for_content, prepare_content_for_display, _re_encode_content, ) from swh.web.common.exc import NotFoundExc from swh.web.common.identifiers import get_swh_persistent_id from swh.web.common.utils import gen_path_info, reverse from swh.web.tests.django_asserts import ( assert_contains, assert_not_contains, assert_template_used, ) from swh.web.tests.strategies import ( content, content_text_non_utf8, content_text_no_highlight, content_image_type, content_text, invalid_sha1, unknown_content, content_utf8_detected_as_binary, ) @given(content_text()) def test_content_view_text(client, archive_data, content): sha1_git = content["sha1_git"] url = reverse( "browse-content", url_args={"query_string": content["sha1"]}, query_params={"path": content["path"]}, ) url_raw = reverse("browse-content-raw", url_args={"query_string": content["sha1"]}) resp = client.get(url) content_display = _process_content_for_display(archive_data, content) mimetype = content_display["mimetype"] assert resp.status_code == 200 assert_template_used(resp, "browse/content.html") if mimetype.startswith("text/"): assert_contains(resp, '' % content_display["language"]) assert_contains(resp, escape(content_display["content_data"])) assert_contains(resp, url_raw) swh_cnt_id = get_swh_persistent_id("content", sha1_git) swh_cnt_id_url = reverse("browse-swh-id", url_args={"swh_id": swh_cnt_id}) assert_contains(resp, swh_cnt_id) assert_contains(resp, swh_cnt_id_url) @given(content_text_no_highlight()) def test_content_view_text_no_highlight(client, archive_data, content): sha1_git = content["sha1_git"] url = reverse("browse-content", url_args={"query_string": content["sha1"]}) url_raw = reverse("browse-content-raw", url_args={"query_string": content["sha1"]}) resp = client.get(url) content_display = _process_content_for_display(archive_data, content) assert resp.status_code == 200 assert_template_used(resp, "browse/content.html") assert_contains(resp, '') assert_contains(resp, escape(content_display["content_data"])) assert_contains(resp, url_raw) swh_cnt_id = get_swh_persistent_id("content", sha1_git) swh_cnt_id_url = reverse("browse-swh-id", url_args={"swh_id": swh_cnt_id}) assert_contains(resp, swh_cnt_id) assert_contains(resp, swh_cnt_id_url) @given(content_text_non_utf8()) def test_content_view_no_utf8_text(client, archive_data, content): sha1_git = content["sha1_git"] url = reverse("browse-content", url_args={"query_string": content["sha1"]}) resp = client.get(url) content_display = _process_content_for_display(archive_data, content) assert resp.status_code == 200 assert_template_used(resp, "browse/content.html") swh_cnt_id = get_swh_persistent_id("content", sha1_git) swh_cnt_id_url = reverse("browse-swh-id", url_args={"swh_id": swh_cnt_id}) assert_contains(resp, swh_cnt_id_url) assert_contains(resp, escape(content_display["content_data"])) @given(content_image_type()) def test_content_view_image(client, archive_data, content): url = reverse("browse-content", url_args={"query_string": content["sha1"]}) url_raw = reverse("browse-content-raw", url_args={"query_string": content["sha1"]}) resp = client.get(url) content_display = _process_content_for_display(archive_data, content) mimetype = content_display["mimetype"] content_data = content_display["content_data"] assert resp.status_code == 200 assert_template_used(resp, "browse/content.html") assert_contains(resp, '' % (mimetype, content_data)) assert_contains(resp, url_raw) @given(content_text()) def test_content_view_text_with_path(client, archive_data, content): path = content["path"] url = reverse( "browse-content", url_args={"query_string": content["sha1"]}, query_params={"path": path}, ) resp = client.get(url) assert resp.status_code == 200 assert_template_used(resp, "browse/content.html") assert_contains(resp, '